High-Performance Library Software for QR Factorization

نویسندگان

Erik Elmroth

Fred G. Gustavson

چکیده

In 5, 6], we presented algorithm RGEQR3, a purely recur-sive formulation of the QR factorization. Using recursion leads us to a natural way to choose the k-way aggregating Householder transform of Schreiber and Van Loan 10]. RGEQR3 is a performance critical sub-routine for the main (hybrid recursive) routine RGEQRF for QR fac-torization of a general m n matrix. This contribution presents a new version of RGEQRF and its accompanying SMP parallel counterpart, implemented for a future release of the IBM ESSL library. It represents a robust high-performance piece of library software for QR factorization on uniprocessor and multiprocessor systems. The implementation builds on previous results 5, 6]. In particular, the new version is optimized in a number of ways to improve the performance; e.g., for small matrices and matrices with a very small number of columns. This is partly done by including mini blocking in the otherwise pure recursive RGEQR3. We describe the salient features of this implementation. Our serial implementation outperforms the corresponding LAPACK routine by 10-65% for square matrices and 10-100% on tall and thin matrices on the IBM POWER2 and POWER3 nodes. The tests covered matrix sizes which varied from very small to very large. The SMP parallel implementation shows close to perfect speedup on a 4-processor PPC604e node.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multifrontral multithreaded rank-revealing sparse QR factorization

SuiteSparseQR is a sparse multifrontal QR factorization algorithm. Dense matrix methods within each frontal matrix enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading Building Blocks library. Rank-detection is performed within each frontal matrix using Heath’s method, which does not require colu...

متن کامل

SCALABILITY ISSUES AFFECTING THE DESIGN OFA DENSE LINEAR ALGEBRA LIBRARYJack

This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distributed memory concurrent computers. These routines form part of the ScaLAPACK mathematical software library that extends the widely-used LAPACK library to run eeciently on scalable concurrent computers. To ensure good scalability and performance, the ScaLAPACK routines are based on block-partitioned...

متن کامل

Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures

To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on ...

متن کامل

Enhancing Parallelism of Tile QR Factorization for Multicore Architectures

To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist of scheduling a Directed Acyclic Graph (DAG) of fine granularity tasks where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on mod...

متن کامل

Fully Empirical Autotuned QR Factorization For Multicore Architectures

Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures. We show that it is hard to rely on a model, which motivates us to design a fully empirical approac...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

High-Performance Library Software for QR Factorization

نویسندگان

چکیده

منابع مشابه

Multifrontral multithreaded rank-revealing sparse QR factorization

SCALABILITY ISSUES AFFECTING THE DESIGN OFA DENSE LINEAR ALGEBRA LIBRARYJack

Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures

Enhancing Parallelism of Tile QR Factorization for Multicore Architectures

Fully Empirical Autotuned QR Factorization For Multicore Architectures

عنوان ژورنال:

اشتراک گذاری